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Abstract. The difTerence between consecutive daily Sunspot Numbers was analysed. 
Its distribution was approximated on a large time scale with an exponential law. In 
order to verify this approximation a Maximum Entropy distribution was generated by 
a modified version of the Simulated Annealing algorithm. The exponential approximation 
holds for the generated distribution too. The exponential law is characteristic for time 
scales covering whole cycles and it is mostly a characteristic of the Sunspot Number 
fluctuations and not of its average variation. 
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1. Introduction 

The oldest and most used index for solar activity is the Sunspot Number 
{SN) or Wolf number describing the number of sunspots S and groups of 
sunspots G that can be observed on the Sun: SN = WG + S. The SN index 
is computed daily and there is a complete record of values SNi since the 
1850's, spanning Solar Cycles 10-23. For the following we used the difference 
between daily SN numbers: 

DSN, = SN,-SNi^i, (1) 

where i is the count of days. The SN values have been obtained from the 
Solar Influences Data Analysis Center of the Royal Observatory of Belgium 
(SIDC, 2010). The graph of DSN as a function of time is presented in 
Figure 1. The distribution of DSN is linked to the mechanisms that generate 
sunspots with a periodicity of about 11 years. While there are deterministic 
mechanisms involved, there are also some stochastic mechanisms, responsible 
especially for the high day to day variability of SN. Thus, a statistical analy- 
sis of DSN may reveal some characteristics of the latter type of mechanism. 
We will further refer to DSN values and their characteristics determined 
from measured SN values as "empirical" values or characteristics. 

The distribution of DSN has been reconstructed both as a histogram and 
as a Maximum Entropy Method (MEM) distribution (Jaynes, 1957a, 1957b, 
1963; Uffink, 1995). The MEM distribution was built by imposing some 
local conditions and was determined using a modified form of the Simulated 
Annealing (SA) heuristic search algorithm (Kirkpatrick, Gelatt, and Vecchi, 
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Figure 1. Evolution of DSN (black points) along solar cycles (average SN - grey line, not 
to scale). The graph starts in the middle of the 12*'' cycle and spans for the last 14 cycles. 

1983; Cerny, 1985; Ingber, 1993). The details of the MEM distribution are 
presented in Appendix A. 

The values of DSN as computed from SN values are integer numbers. 
However, we argue that both SN and DSN values can be thought as real 
numbers, where the fractional part expresses the degree with which a cer- 
tain "average sunspot" is realised. Thus, the maximum entropy distribution 
was extended into the set of real numbers. The distributions obtained were 
approximated on certain ranges of DSN with exponential functions of the 
form: 



where m is a distribution parameter and Z is a constant such that p has 
integral equal to 1. 

The SN time series have both a deterministic component, which may be 
taken as the monthly or yearly average of SN, and a stochastic component, 
the latter appearing as daily fluctuations of SN around the average. Both 
components have characteristic distributions. The basic models used in ex- 
plaining the long-term (on scales of months and above) behaviour of the solar 
cycle are the dynamo models (Proctor and Gilbert, 1994). It was pointed out 
that this model may exhibit chaotic behaviour. Letellier et al. (2006) discuss 
such a chaotic mechanism. Their conclusions are that it has low dimension- 
ality and similar behaviour to a Rossler system of differential equations. On 
the other hand, Aguirre, Letellier, and Maquet (2008) model the solar cycle, 
transformed into a symmetrical space, through an autoregressive model with 
deterministic terms and stochastic residuals with Gaussian distribution. The 
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model obtained has a chaotic attractor similar to that of the Rossler system. 
Aguirre, Letellier, and Maquet point out that the sunspot number time 
series is nonstationary. Nordemann (1992) approximated the time series of 
yealy averages of SN with an exponential function separately for the rising 
and declining phases. The exponential dependence indicates a power law 
distribution of the SN yearly averages for each phase. For larger time scales 
the distribution may deviate from the power law. 

The stochastic part of the SN time series has been analysed both in 
connection to a chaotic dynamo and as a separate term. In many studies 
exponential distributions and dependences were found. Pontieri et al. (2003) 
build a dynamo model with fractional Brownian motion in order to explain 
the stochastic character of the SN time series. They determine the Hurst 
exponent of SN to be ~ 0.76 on time scales from 20 to 350 days. This 
indicates a Brownian motion with positive autocorrelation at work at these 
time scales. Noble and Wheatland (2011) develop a model for the stochas- 
tic part of the SN time series which is represented as a diffusion process 
described by the Fokker-Planck equation. The solution of this equation has 
an approximate exponential form. They find that at solar minima the SN 
index has a distribution with exponential tail. They also point out that 
the daily SN distribution is approximately exponential for the period 1850- 
2010. Greenkorn (2009) looked at daily SN data in order to determine the 
type of flow in the Sun's convective layer. The SN values indicate stochastic 
behaviour for Cycles 10-19 and chaotic behaviour for Cycles 20-23, the last of 
which presents an increase in the stochastic character compared to previous 
cycles. 

Salakhutdinova (1998; 1999) analysed the SN time series by separating it 
into a regular (deterministic) low frequency component and a stochastic high 
frequency component. The stochastic component has properties of white 
(Gaussian) noise at time scales lower than two months and for time scales 
between two months and two years it has properties similar to pink (flicker) 
noise. The two kinds of properties may characterize two sub-components that 
combine to form the stochastic part of SN. Generally, for large time scales 
the stochastic component can be associated with a chromatic noise with 
sharp maxima. The regular component behaves as a nonlinear oscillator, 
with similarities to a Rossler system. Lepreti et al. (2000) compute the Hurst 
index for the SN index and the Ha flare index for the period 1976-1996. 
The two indices have similar Hurst values (0.76 and 0.74) on intervals of 
20-350 days and 24-450 days respectively. These show the presence of long- 
range autocorrelations for the two solar activity indices for the period 1976- 
1996. The authors determined the distribution of normalised fluctuations 
of the SN and Ha flare indices. The fluctuations were computed as 1-day 
differences of the respective indices. Next, these differences were normalised 
by subtracting their average value and then dividing by their root mean 
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square value. For both indices, the probabiHty density functions are stretched 
exponentials of the form p{x) = Aexp (— ajxl*"). For SN they found r ~ 1.04, 
which is very close to a Laplace distribution and for the flare index r ~ 0.66. 

Vecchio et al. (2005) analyse the coronal emission time series at 530.3 
nm from 1949 to 1996 with the Proper Orthogonal Decomposition method. 
The method uses a series expansion of the coronal emission relative to some 
basis functions by means of coefficients which depend on time. The authors 
determined the distribution of the 1-day difference of each coefficient's val- 
ues. The probability density function is similar to a stretched exponential 
for the central part of the distribution, where r is close to 1. 

Kanazir and Wheatland (2010) model the flare energy distribution and 
waiting times for two solar active regions spanning each about one month. 
They find that the flare frequency-size index has a power law distribution, 
while the waiting times have an exponential distribution. The parameter of 
the exponential distribution may suffer a jump in time, as found for one of 
the two active regions. This jump is explained in a jump-transition model by 
corresponding step changes in the rates of energy input and flare transition. 
Nevertheless, between step changes the waiting times distribution is a stable 
exponential distribution. 



2. Empirical Data Approximations 

In order to obtain the distribution of DSN, daily data from Cycles 10 - 
23 were used. The values of DSN range from -91 to 112. The empirical 
values of DSN were computed from the registered SN values and then the 
empirical probability p(DSN) was determined by counting the occurrence 
of each value of DSN. The probability for each DSN is presented in Figure 
2. By applying a least squares fit to In p{DSN), it was found that p{DSN) 
can be approximated well with exponential functions (2) with the following 
parameters: m = -9.32 for DSN G [-60; -10] and m = 9.37 for DSN G 
[10; 60]. The goodness of fit is characterised by the correlation coefficient 
R ~ 0.99 in both cases. Outside these intervals p{DSN) diverges from the 
exponential fit. A special case is DSN = 0, for which the probability is 
much bigger than probabilities associated to other values of DSN. Thus, it 
appears that DSN = represents a special state. We propose a model to 
describe the general behaviour of DSN, according to which the probability 
density function (pdf) of values of DSN in the set of real numbers is of the 
form: 



p{DSN) = p{DSN = 0)6{DSN) + p{DSN < 0)p-{DSN) 

+p{DSN > 0)p+{DSN), (3) 
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Figure 2. (left) Empirical and Maximum Entropy distributions of the values of DSN . 
Note the high pdf value associated with DSN — 0; (right) Natural logarithm of the DSN 
distributions with exponential approximations of the Maximum Entropy pdf computed 
for \DSN\ G [10; 60]. 



where 6{-) is the Dirac distribution, p~{-), p^(-) are probabiUty density func- 
tions associated with negative and positive values of DSN respectively and 
p{-) are probabilities associated with the respective intervals of DSN. From 
the empirical distribution determined above, p{DSN = 0) = 0.172. Since 
p{-DSN) « p{DSN), we take p{DSN < 0) = p{DSN > 0) = 0.414. The 
density functions p~ , p"*" have been determined by the Maximum Entropy 
Method (see Appendix A). Note that the empirical probabilities p{DSN) 
can be taken as (smoothed) pdf values associated with the corresponding 
values of DSN since pdf is probability divided by corresponding interval 
and p{DSN) corresponds to an interval of DSN of length 1. 



3. Results 

3.1. Maximum Entropy distribution approximations 

We collected 55 882 empirical values of DSN. In order to determine the 
pdfs p~(-); P~^{') all values of were eliminated, according to model (3). We 
used A^- = 23 587 DSN values for p-{-) and N+ = 22 696 values for p+{-). 
The pdf estimate obtained is presented in Figure 2; also, its logarithm is 
represented along exponential fits computed by the least squares method for 
DSN G [-60; -10] and DSN G [10; 60] respectively. The fits have param- 
eters m = -9.51 for DSN £ [-60; -10] and m = 9.60 for DSN G [10; 60]. 
The goodness of fit is characterised by i? ~ 0.995. These are very close to 
the empirical data fits. We conclude that indeed there is an exponential law 
in the distribution of DSN. 

From both the empirical values and the MEM distribution it can be 
seen that the distribution of DSN changes abruptly from the exponential 
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approximations for small \DSN\. There are two local maxima at about 
DSN = —7 and DSN between 7 and 8. There may be an observational 
cause for this characteristic. When the SN index is computed from sunspot 
observations, each group of sunspots counts as 10 individual sunspots, thus 
it may be easier to observe a higher value, e.g. SN > 7 than smaller values. 
Thus, jumps from previous values of SN = to SN = 7 and back may be 
more probable than other jumps around the cycles' minima. 

For values of \DSN\ < 7 the MEM distribution registers a variation that 
is steeper than the same variation in the empirical probability. The values 
of pdf increase up to about \DSN\ = 1.3 and then drop a little. Whether 
this is a real feature is uncertain. It is possible that it represents a spurious 
feature of the MEM as it tries to recover a uniform distribution at the edges, 
where there are fewer effective constraints. 

For values \DSN\ > 60 the MEM pdf diverges from the exponential 
approximations, which is supported by the empirical distribution of DSN. 
At the edge of the interval for DSN considered for building the MEM pdf, 
a sharp increase appears. The latter is a spurious feature. In this area the 
conditions have very small values, such that the entropy exceeds the value 
of the quadratic error E (see Appendix). Thus, the optimisation is done 
by maximising the entropy, which shifts the distribution toward a uniform 
distribution. The entropy may also affect the divergence from the exponen- 
tial law for \DSN\ > 60; thus, we consider that the MEM pdf gives only 
a qualitative description for very high values of \DSN\. This is not very 
significant though, as the values of the pdf are below 10~^ in this region, far 
smaller than those close to DSN = 0. 

3.2. Time-scale Localisation of the Exponential Approximation 

In order to determine the time scale at which the exponential law is valid, the 
moving average and standard deviation of DSN were computed for different 
time intervals and their values were compared to theoretical values computed 
with (2) and m = ±9.35, where the sign was taken + for DSN < and 
— for DSN > 0. All computations were carried out for \DSN\ € [10; 60]. 
The results are shown in Figure 3. The best localisation, i.e. the smallest 
time interval for which the distribution of DSN is close to the theoretical 
distribution (2), is obtained for a time span of about 11 years or approxi- 
mately one solar cycle period. Cycles 10 and 11 stand out as anomalies, since 
their standard deviation is very different from the theoretical one even for 
large time intervals. For the Cycles 12-23, the standard deviation is close to 
the theoretical value with displacements of about 10% from the theoretical 
standard deviation. The results obtained for a 1 year time interval show 
where deviations from the exponential law occur. The average DSN has 
sharp peaks during cycles' minima. The standard deviation has maxima 
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and minima corresponding to the maxima and minima of solar cycles. Its 
maxima are close to the theoretical standard deviation, while the minima 
are far from it. It appears thus that deviations from the exponential law 
occur around minima of the solar cycles. This is partly a consequence of 
the fact that the minima are populated with only a few values of \DSN\ 
above 10; thus, the sharp behaviour is partly generated by fluctuations in 
the average and standard deviation of \DSN\. 

3.3. The Origin of the Exponential Law: Fluctuations vs. 
Deterministic SN 

The difference DSN between SN consecutive values at a fixed date can be 
reduced to two components: a smooth deterministic variation DSNdet and 
an error (or fluctuation) term DSN^rr, such that 

DSN = DSNdet + DSNerv (4) 

A similar relation holds between the variances of the terms implied: 
Var{DSN) = Var{DSNdet) + Var{DSNerr) 

+Cov{DSNdet,DSNerr), (5) 

where Var{-) represents the variance and Cof (•, •) the covariance of the 
respective quantities. We tested to see whether one of the terms DSN^et, 
DSNerr IS far more important than the other. If this were the case, it is 
expected that the negligible term would have negligible statistical charac- 
teristics and its distribution would have a negligible contribution in the DSN 
overall distribution. Since the averages of the DSN components above are 
close to 0, the test was carried on their variances. Moving variances on 
intervals of one year were used. The results are shown in Figure 4. It can 
be seen that Var{DSN) is at least 100 times greater than Var{DSNdet) 
or Cov{DSNdet, DSNerr), thus by far the most important component of 
DSN appears to be the fluctuation DSN^rr- The exponential distribution 
can be assumed to be given by DSN fluctuations and not its deterministic 
component. 



4. Conclusions 

The most important characteristic of the distribution of DSN is its almost 
perfect symmetry with respect to the vertical axis. The small differences in 
the MEM pdf between negative and positive values of DSN are not well 
supported by empirical values and we consider them spurious phenomena, 
due to the stochastic nature of the Simulated Annealing algorithm and to 
noise in the initial data. 
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Figure 3. Moving averages and standard deviations of DSN (black lines) compared to the 
1 year moving average SN shape (grey line - not to scale). Only values IDS'A'^I £ [10; 60] 
were considered in the computations. The statistical characteristics of DSN were com- 
puted on time intervals of (a) 1 year, (b) 5.5 years and (c) 11 years. The dashed lines 
represent theoretical average and standard deviation computed with (2) and m — ±9.35, 
sign opposite to that of DSN. The average SN describes the smoothed solar cycles' 
evolution. 



Overall, the distribution of DSN has several areas of specific variation: 
(i) For low values of \DSN\ the pdf decreases as \DSN\ increases, probably 
exponentially, but this is uncertain because of the small number of points. 
Around \DSN\ = 7 the pdf has a slight but abrupt rise, (ii) For \DSN\ 
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Figure 4- The time evolution of the variance components of DSN: variance of total DSN, 
variance of deterministic DSN (det) component as estimated from the one year average 
SN and the covariance of the deterministic (det) and error (err) components of DSN. 
The difference between the total variance and the latter two components represents the 
variance of the error component (not represented), which was found to be superposed on 
the total variance. All values were computed on moving one year intervals. 

between 10 and 60 it obeys an exponential law of an approximate form: 



with m ~ 9.35 experimentally and m ~ 9.56 in the MEM pdf; Z is a 
constant, (iii) For very large \DSN\ the pdf decreases somewhat faster than 
exponential with increasing \DSN\. 

The distribution (6) appears to be valid only when whole cycles are 
considered. For small time intervals, the DSN distribution may deviate 
from the exponential law. This happens especially around minima of the 
solar cycles. 

The presence of the exponential law dependence is interesting in itself. 
This result was previously found by Lepreti et al. (2000) for SN values 
spanning a 20 year period. Here the result is investigated in more detail 
and for a longer time period. The formula (6) represents a Laplace distri- 
bution centered at DSN = (Kotz, Kozubowski, and Podgorski, 2001). 
The deviations from this law happen for very low values of \DSN\, which 
occur especially (but not only) in the minima of solar cycles, and also for 
very high values of \DSN\, which occur at the maxima of solar cycles. The 
latter case happens only for a few values of the solar cycles analysed, i.e. 
for \DSN\ > 60, as can be seen from Figure 1. Thus, it can be concluded 
that overall the exponential law holds for all values of \DSN\ of statistical 
significance, except values around DSN = 0. The failure of the exponential 
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law for small \DSN\ may be due to the way the Sunspot Number {SN) is 
computed by counting both individual sunspots and groups of sunspots. This 
is especially significant for solar cycles minima, where abrupt jumps from 
SN = to SN ?a 7 may easily appear. Nevertheless, the value DSN = 
stays above the Laplace pdf (6). 

The obtained results can be applied in the field of stochastic predictive 
models for solar activity and space weather. The daily stochastic behaviour 
is superposed over a long-term cyclic characteristic. These daily fluctuations 
may be represented as a stochastic process with jumps distributed according 
to (6). Nevertheless, such a stochastic process would give only an average, 
long-term representation. It must be completed with additional terms that 
give the exact distribution at smaller time scales. These terms are yet to be 
determined. 

The Laplace distribution of DSN is linked to the exponential distribution 
of SN, as evidenced by Noble and Wheatland (2011). Indeed, by assuming 
that daily SN values are independent and identically distributed random 
variables with exponential distributions, the difference of any two SN vari- 
ables is a random variable with a Laplace distribution. However, the SN 
values are not independent on a day-to-day basis. Similarly, the Laplace 
distribution does not appear at a daily level, but rather at whole cycle 
levels. We hypothesize that the Laplace distribution of DSN appears at 
time intervals for which the beginning and end values of SN are completely 
independent. It is expected that the distribution is stationary at this time 
scale. This seems to be the case for 11 year intervals, as seen from Figure 
3, where both the average and standard deviation tend to become stable 
in time. Nevertheless, the standard deviation still has strong jumps even at 
this time scale, possibly showing abrupt changes in the Laplace distribution 
parameter m from one cycle to the next. The long time span at which 
stability occurs may be a characteristic of the magnetic flux tubes lying 
in the convective layer, which is preserved at least until the magnetic fleld 
changes polarity. As they emerge through the photosphere, the flux tubes 
reveal this characteristic in the form of the Laplace distribution of DSN. 
Nevertheless, other more complex processes are at work for small time scales, 
where the distribution is highly irregular. 



Appendix 
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A. MEM Distributions 



The Maximum Entropy approach for probabihty distributions consists in 
determining the distribution that maximises the entropy with some imposed 
constraints. We present the way it was apphed in the present work. Each 
pdf p~{DSN), p'^{DSN) was determined separately as sets of probabiUties 
p~ (DSNi), p~^{DSNj) associated with corresponding values DSNi < 0, 
DSNj > 0, i,j = 1,2, Np, which were next transformed into pdf values. 
This is a general approach to building a continuous pdf which eliminates 
the need for a prior model to fit p~{-), p~^i-)- The number Np of values of 
DSN considered in each case is much greater than the empirical number of 
probability values. The values DSN, DSNj where chosen at equal distances 
in the corresponding intervals. 

We illustrate below the path followed for DSN < 0; for DSN > we 
proceeded in a similar fashion. The Shannon discrete entropy was used: 



Next, some local constraints were imposed on the distribution. These were 
built with a Gaussian function: 



The points xf^ are equally distanced in the interval of DSN < 0. The method 
of Kernel Density Estimation uses similar functions for building a pdf esti- 
mate with good results (Rosenblatt, 1956; Parzen, 1962; Cranmer, 2000; Wu 
and Mielniczuk, 2002). The functions were averaged, yielding empirical 
values < fk >emp computed over N~ empirical values of x = DSN < 
collected from the 14 last cycles and also simulated averages < fk >sim 
computed with the values x = DSNi and their associated probabilities 
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A quadratic error was used: 
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When the error E is minimised, the obtained distribution presents a lot of 
noise. In order to smooth it, the entropy H was maximised along with the 
minimisation of E. Thus, the function to be minimised was taken of the 
form: 

Eh = E- kH, (12) 

where A: is a constant that regulates the degree of smoothing. For big k 
the distribution obtained is close to the uniform distribution, while for very 
small k the distribution has a lot of noise. Based on tests carried on some 
known distributions, a value of k = 10~^ was found to be sufficient. 

In order to find the minimum of Eh a modified version of the Simulated 
Annealing (SA) algorithm was used. In the original algorithm, a temperature 
parameter is decreased during the run in order to control the convergence 
towards the absolute minimum. We found this to be unnecessary, thus we 
set the temperature to 0. Also, in the original algorithm, at each step, the 
solution vector {p~ (DSNi))i is constructed by modifying all its elements at 
random at each step. We found this to be detrimental and the modification 
was carried on only one element chosen at random at each step. The rate of 
change of the vector elements was decreased as the algorithm proceeded. 

A number of Nq = 101 conditions were used in each case. An equal num- 
ber of values of , were generated in the intervals [—100; —1] and [1; 120] 
respectively. In each case the solution was a vector of Np = 1 000 probabili- 
ties {p~{DSNi))^, {p'^{DSNj))j, corresponding to equally distanced values 
of DSN set in the above intervals. Thus, in each interval there are about 10 
times more simulated probability values for the pdf than empirical values. 
The conditions' parameter a was chosen such that each condition covers a 
significant part of the interval of values of DSN. It vas taken equal to 1/30 
of the length of the interval in each case. This gives a width at half-height 
for each fk of about 10 units of DSN and a base coverage of about 20 units 
of DSN. The functions fk have maxima distanced at about 1 unit of DSN, 
thus yielding a dense set of conditions. This value of a was tested on some 
known distributions and was found satisfactory. 
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